33 research outputs found

    Neural Distributed Compressor Discovers Binning

    Full text link
    We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner-Ziv problem, is a special case of distributed source coding. To this day, practical approaches for the Wyner-Ziv problem have neither been fully developed nor heavily investigated. We propose a data-driven method based on machine learning that leverages the universal function approximation capability of artificial neural networks. We find that our neural network-based compression scheme, based on variational vector quantization, recovers some principles of the optimum theoretical solution of the Wyner-Ziv setup, such as binning in the source space as well as optimal combination of the quantization index and side information, for exemplary sources. These behaviors emerge although no structure exploiting knowledge of the source distributions was imposed. Binning is a widely used tool in information theoretic proofs and methods, and to our knowledge, this is the first time it has been explicitly observed to emerge from data-driven learning.Comment: draft of a journal version of our previous ISIT 2023 paper (available at: arXiv:2305.04380). arXiv admin note: substantial text overlap with arXiv:2305.0438

    The Unreasonable Effectiveness of Linear Prediction as a Perceptual Metric

    Full text link
    We show how perceptual embeddings of the visual system can be constructed at inference-time with no training data or deep neural network features. Our perceptual embeddings are solutions to a weighted least squares (WLS) problem, defined at the pixel-level, and solved at inference-time, that can capture global and local image characteristics. The distance in embedding space is used to define a perceptual similarity metric which we call LASI: Linear Autoregressive Similarity Index. Experiments on full-reference image quality assessment datasets show LASI performs competitively with learned deep feature based methods like LPIPS (Zhang et al., 2018) and PIM (Bhardwaj et al., 2020), at a similar computational cost to hand-crafted methods such as MS-SSIM (Wang et al., 2003). We found that increasing the dimensionality of the embedding space consistently reduces the WLS loss while increasing performance on perceptual tasks, at the cost of increasing the computational complexity. LASI is fully differentiable, scales cubically with the number of embedding dimensions, and can be parallelized at the pixel-level. A Maximum Differentiation (MAD) competition (Wang & Simoncelli, 2008) between LASI and LPIPS shows that both methods are capable of finding failure points for the other, suggesting these metrics can be combined

    Median Trilateral Loop Filter for Depth Map Video Coding

    Get PDF
    Abstract-Emerging extensions to conventional stereo video technologies like 3D Video require to add depth information to 2D video data. This supplementary data needs to be coded efficiently and transmitted to the receiver where arbitrary viewpoints are generated by using this additional information. The depth maps are characterized by piecewise smooth regions, which are bounded by sharp edges describing depth discontinuities along object boundaries. Preserving these characteristics and especially depth discontinuities is a crucial requirement for depth map coding. When coding depth maps by means of a conventional hybrid video coder, ringing artifacts are introduced along the sharp edges and result in quality degradation when using the reconstructed depth maps for view synthesis. To reduce these ringing artifacts and also to better align object boundaries in video and depth data, a new in-loop filter is proposed, which reconstructs the described characteristics of depth maps
    corecore